perm filename NOTES.PDQ[LSP,JRA]6 blob
sn#237460 filedate 1976-09-18 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00002 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 .SS(Syntax-directed I/O,SDIO,P105:)
C00016 ENDMK
C⊗;
.SS(Syntax-directed I/O,SDIO,P105:)
.BEGIN TURN ON "←";
It is frequently quite desirable and convenient to enter input and
receive output in something other than S-exprs.
Recall our diagram on {yon(P46)}.
%3eval%* demonstrated that at least one style of evaluation can be mechanized.
What we wish to do now is examine the possibility of mechanizing the
encoding of the input and the decoding of the output.
Consider for example, the problem of simplification of algebraic expressions.
Many rather sophisticated simplifiers have been written
(⊗↑[Hea#68]↑, ⊗↑[Mac#xx]). Assume that we have
one named %3simplify%* which expects S-expr input and gives S-expr output. Thus
for example:
.BEGIN SELECT 3;CENTERIT;
←(3+4)*x + x =%4I %*=> (PLUS (TIMES (PLUS 3 4) X) X) =%4simplify%*=> (TIMES 8 X) =%4O %*=> 8*x.
.END
We would like transformations I and O done automatically. M-expr notation is
a candidate for such a task.
.BEGIN SELECT 3;CENTERIT;
←cons[A;B] =%4I %*=> (CONS (QUOTE A)(QUOTE B)) =%4eval%*=> (A . B) =%4O %*=> (A . B)
.END
Transformation O is the identity in this case.
In many cases the automatic generation of the I and O transformations can be
done. Such a program, SDIO (Syntax#Directed#Input#Output),
was written by Lynn Quam of the Stanford AI Laboratory
in 1968 ⊗↑[Qua#68]↑.
It was the forerunner of the more ambitious MLISP2 project⊗↑[Mli#72]↑.
The basic idea is simple enough. We will assume that the input and output
syntax is specified in BNF. With each BNF equation we will associate
semantics describing the S-expr representation. The input transformation (parser)
will
use this information to build the representation; and the output transformation
(unparser) will map the internal representation back.
The importance of syntax directed I/O should not be minimized. One aspect of
SDIO is the ablility to define new data types in LISP. Assume we wish to
represent a structure as a particularly horrible list structure. We can give
augmented BNF equations specifying the external representation and the translation
to the underlying representation. Clearly when outputting these structures
we do not want to see the internal representation. This can be particularly
annoying when we are debugging; when we are trying to concentrate on a misbehaving
algorithm we do not want to be distracted by incomprehensible output.
Thus syntax directed output, or unparsing, is at least as important as input.
Perhaps the easiest introduction to SDIO is to examine an example.
Consider the proposed
simplification task above. The "natural" input syntax can be described in BNF.
We have given closely related syntax equations
on {yon(P68)}. We will display a few equations augmented by SDIO semantics.
.GROUP;
For example:
.BEGIN TABIT2(17,55);
<EXP>\::= <EXP> + <TERM>\=>(PLUS EXP TERM)
\::= <TERM>\=>*
<TERM>\::= <NUMBER>\=>*
.END
.APART
To the input parser the first BNF equation means: whenever the right hand side
is recognized, reduce that occurrence to the left hand side and associate with it
the list consisting of the atom PLUS, the S-expr associated with the occurrence of
<EXP>, and the S-expr associated with the occurrrence of <TERM>.
The second equation means reduce <TERM> to <EXP> associating whatever S-expr
is attached to <TERM> with that occurrence of <EXP>. In the third equation
we assume that <NUMBER> is a syntactic type recognized by the scanner, and return
that number as the semantic value.
For example, if such a parser were given 2+3+44 it should return the list
%3(PLUS (PLUS 2 3) 44)%*.
The unparser uses these equations in the inverse manner. It will see a S-expr
and will attempt to match that to the description of the semantics, outputting
an instance of the BNF if successful.
The SDIO program will take such an augmented set of BNF equations and
generate a parser and an unparser for the language.
This project involves writing such a SDIO program. We describe a basic
SDIO program and suggest extensions and improvements.
The best way to describe the format of SDIO input is to give an SDIO
description.
.BEGIN TABIT2(17,55);GROUP;
<RULES>\::= END\=>NIL
\::=<RULE><RULES>\=>(RULE . RULES)
<RULE>\::= <LFPT><RTLST>\=>(LFPT RTLST)
<RTLST>\::= ::=<RTPT><SEXPR><RTLST>\=>((RTPT SEXPR) . RTLST)
\::= %Cε%*\=>NIL
<LFPT>\::= <<ID>>\=>*
<RTPT>\::= "=>\=>NIL
\::= <RPELEM><RTPT>\=>(RPELEM . RTPT)
<RPELEM>\::= <<ID>>\=>*
\::= <ID>\=>(SPWD ID)
\::= ""<CHAR>\=>(QCH CHAR)
\::= <CHAR>\=>(CH CHAR)
<SEXPR>\::= <ATOM>\=>*
\::= (<SEXPRLIST>)\=>*
<SEXPRLIST>\::= <ATOM> \=>*
\::= <SEXPR> <SEXPRLIST>\=>(SEXPR . SEXPRLIST)
\::= %Cε%*\=>NIL
END
.END
The expressions %3(SPWD ID), (QCH CHAR),%1 and %3(CH CHAR)%1 are S-expr
representations of calls on rountines to process special or reserved words,
quoted characters or special characters, respectively.
The input to SDIO is a sequence of augmented BNF equations terminated with END.
What the SDIO program sees is a S-expr representation of these equations.
The sample equations for <EXP> above would pass the following to the SDIO program:
.BEGIN TABIT3(10,13,21);GROUP;
\(\(EXP (\((EXP (CH +) TERM) (PLUS EXP TERM))
\\\((TERM) NIL)))
\\(TERM (\((NUMBER) NIL))) )
.END
What the SDIO program outputs are the parser and unparser.
The elements of the BNF equations in SDIO are rather standard: syntactic variables,
which are
identifiers bracketed by "<" and ">";
and special words, which are identifiers; and
special characters, which are preceeded by " if they conflict with the special
characters of the BNF.
The elements of the semantics are: unbracketed syntactic variables occurring
in the RHS of the associated BNF equation; other identifiers, taken as constants;
NIL, the LISP atom; notation for %3cons%*-ing, %3(###.###)%*; notation for
making a list, %3(e%41%*#...#e%4n%*)%1; the character *, described above.
.CENT(Project)
Write such a SDIO program. You should consult local LISP documentation
when building the basic I/O routines like <NUMBER>, <CHAR>, or <ID>.
.CENT(First Extension)
You may have noticed already that the basic SDIO program fails to
distinguish two occurrences of the same syntactic variable on the RHS of an
equation. Thus an equation like:
.BEGIN TABIT1(8);GROUP;
<ZIP>\::= <ZAP> <FOO> <ZAP> must be replaced by the pair:
<ZIP>\::= <ZAP> <FOO> <ZAP1>
<ZAP1>\::=<ZAP>
.END
This trick is called ⊗→stratification↔←. It is a syntactic trick, adding
nothing to the semantics.
Add notation to the semantics of your SDIO program to handle RHS with
multiple occurrences of syntactic variables. Modify your parser generators
accordingly.
.CENT(Second Extension)
Besides building a S-expr representation of the input, it is frequently
desirable to generate other information during the input parse. Lists of
occurrences of operators or other tables are commonly needed. The additional
information could be discovered by examination of the completed parse tree,
but that requires reexamination of the tree. It is much more efficient to
do as much as possible on a single pass.
Introduce notation which will allow execution of arbitrary LISP code
as the parse progresses. That code should be able to manipulate
any of the semantic properties associated with the syntactic variables appearing
in the RHS of the associated syntax equation.
.CENT(Third extension)
While it is obviously advantageous to produce output in the language
described by the BNF equations rather than the S-expr form, formatting
of the output can be equally beneficial.
We should like to be able to specify formatting information in SDIO
such that spacing and line-length are controlled.
One proposal is to embed spacing and line-feed control characters in the
BNF equations. The spacing character is "→" and the line-feed is "↓".
The "↑" sets the indentation point for the string on its right; and the "→"
followed by a digit says space over than number of spaces from the
last indentation point if the remaining space on the line is not sufficient to
contain all text specified by the remaining RHS of the equation.
"0→", meaning go to the indentation point can be written "→".
For example consider the following:
.BEGIN TABIT2(10,24);GROUP;
\<EXPR>\::= <ID>( ↓ <EXPR_LIST>)
\\::= <ID>
\<EXPR_LIST>\::= ↑<EXPR> , →<EXPR_LIST>
\\::= ↑<EXPR>
.END
.GROUP;
These equations, when used to drive an unparser, could give:
.BEGIN TABIT1(7);SELECT 3;
mumf(\a,
\foobaz(garp(b)),
\bletch(a,b,c),
\d)
.END
as the formatted version of:
.BEGIN CENTER; SELECT 3;
mumf(a,foobaz(garp(b)),bletch(a,b,c),d)
.END
Extend SDIO to handle formatting of output.
.APART
.END